 |
State machine replication Totally Explained
|
|  |
|
NEW! |
All the latest news in the worlds of
computer gaming,
entertainment,
the environment,
finance,
health,
politics,
science,
stocks & shares,
technology
and much,
much,
more.
|
Everything about State Machine Replication totally explained==Introduction== » Introduction from Schneider's 1990 survey:
"Distributed software is often structured in terms of clients and services. Each service comprises one or more servers and exports operations that clients invoke by making requests. Although using a single, centralized, server is the simplest way to implement a service, the resulting service can only be as fault tolerant as the processor executing that server. If this level of fault tolerance is unacceptable, then multiple servers that fail independently must be used. Usually, replicas of a single server are executed on separate processors of a distributed system, and protocols are used to coordinate client interactions with these replicas. The physical and electrical isolation of processors in a distributed system ensures that server failures are independent, as required.
» "The state machine approach is a general method for implementing a fault-tolerant service by replicating servers and coordinating client interactions with server replicas. The approach also provides a framework for understanding and designing replication management protocols. Many protocols that involve replication of data or software - be it for masking failures or simply to facilitate cooperation without centralized control - can be derived using the state machine approach. Although few of these protocols actually were obtained in this manner, viewing them in terms of state machines helps in understanding how and why they work."
Preliminaries
State Machine Definition
For the subsequent discussion a State Machine will be defined as the following tuple of values Mealy Machine:
- A set of States
- A set of Inputs
- A set of Outputs
- A transition function (Input x State -> State)
- An output function (Input x State -> Output)
- A distinguished State called Start.
A State Machine begins at the State labeled Start. Each Input received is passed through the transition and output function to produce a new State and an Output. The State is held stable until a new Input is received, while the Output is communicated to the appropriate receiver.
It should be clear that any algorithm can be implemented using this model if driven by an appropriate Input stream. In particular, this discussion requires a State Machine to have the following property:
» Deterministic:
Multiple copies of the same State Machine begun in the Start state, and receiving the same Inputs in the same order will arrive at the same State having generated the same Outputs.
Fault Tolerance Explained
Determinism is an ideal characteristic for providing fault-tolerance. Intuitively, if multiple copies of a system exist, a fault in one would be noticeable as a difference in the State or Output from the others.
A little deduction shows the minimum number of copies needed for fault-tolerance is three; one which has a fault, and two others to whom we compare State and Output. Two copies isn't enough; there's no way to tell which copy is the faulty one.
Further deduction shows a three-copy system can support at most one failure (after which it must repair or replace the faulty copy). If more than one of the copies were to fail, all three States and Outputs might differ, and there would be no way to choose which is the correct one.
Research has shown that in general a system which supports F failures must have 2F+1 copies (also called replicas). The extra copies are used as evidence to decide which of the copies are correct and which are faulty. Special cases can improve these bounds .
All of this deduction pre-supposes that replicas are experiencing only random independent faults such as memory errors or hard-drive crash. Failures caused by replicas which attempt to lie, deceive, or collude are called Byzantine Failures . Both random and Byzantine failures are supported by the State Machine Approach, with isolated changes.
It should be noted that failed replicas are not required to stop; they may continue operating, including generating spurious or incorrect Outputs.
The State Machine Approach
The preceding intuitive discussion implies a simple technique for implementing a fault-tolerant service in terms of a State Machine:
Place copies of the State Machine on multiple, independent servers.
Receive client requests, interpreted as Inputs to the State Machine.
Choose an ordering for the Inputs.
Execute Inputs in the chosen order on each server.
Respond to clients with the Output from the State Machine.
Monitor replicas for differences in State or Output.
The remainder of this article develops the details of this technique.
Step 1 and 2 are outside the scope of this article.
Step 3 is the critical operation, see Ordering Inputs.
Step 4 is covered by the State Machine Definition.
Step 5, see Ordering Outputs.
Step 6, see Auditing and Failure Detection.
The appendix contains discussion on typical extensions used in real-world systems such as Logging, Checkpoints, Reconfiguration, and State Transfer.
Ordering Inputs
The critical step in building a distributed system of State Machines is choosing an order for the Inputs to be processed. Since all non-faulty replicas will arrive at the same State and Output if given the same Inputs, it's imperative that the Inputs are submitted in an equivalent order at each replica. Many solutions have been proposed in the literature .
A Visible Channel is a communication path between two entities actively participating in the system (such as clients and servers).
Example: client to server, server to server
A Hidden Channel is a communication path which isn't revealed to the system.
Example: client to client channels are usually hidden; such as users communicating over a telephone, or a process writing files to disk which are read by another process.
When all communication paths are visible channels and no hidden channels exist, a partial global order (Causal Order) may be inferred from the pattern of communications .
Inputs may be ordered by their position in the series of consensus instances (Consensus Order).
Sending Outputs
Client requests are interpreted as Inputs to the State Machine, and processed into Outputs in the appropriate order. Each replica will generate an Output independently. Non-faulty replicas will always produce the same Output. Before the client response can be sent, faulty Outputs must be filtered out. Typically, a majority of the Replicas will return the same Output, and this Output is sent as the response to the client.
System Failure » If there's no majority of replicas with the same Output, or if less than a majority of replicas returns an Output, a system failure has occurred. The client response must be the unique Output: FAIL.
Auditing and Failure Detection
The permanent, unplanned compromise of a replica is called a Failure. Proof of failure is difficult to obtain, as the replica may simply be slow to respond, or even lie about its status . Cryptographic security isn't required for checksums.
It is possible that the local server is compromised, or that the Audit process is faulty, and the replica continues to operate incorrectly. This case is handled safely by the Output filter described previously (see Sending Outputs).
Appendix: Extensions
Input Log
In a system with no failures, the Inputs may be discarded after being processed by the State Machine. Realistic deployments must compensate for transient non-failure behaviors of the system such as message loss, network partitions, and slow processors is a protocol for solving consensus, and may be used as the protocol for implementing Consensus Order.
Paxos requires a single leader to ensure liveness [. That is, one of the replicas must remain leader long enough to achieve consensus on the next operation of the state machine. System behavior is unaffected if the leader changes after every instance, or if the leader changes multiple times per instance. The only requirement is that one replica remain leader long enough to move the system forward.
» Conflict Resolution]
In general, a leader is necessary only when there's disagreement about which operation to perform [, and if those operations conflict in some way (for instance, if they don't commute) ][.
» When conflicting operations are proposed, the leader acts as the single authority to set the record straight, defining an order for the operations, allowing the system to make progress.]
With Paxos, multiple replicas may believe they're leaders at the same time. This property makes Leader Election for Paxos very simple, and any algorithm which guarantees an 'eventual leader' will work.
Historical background
Leslie Lamport was the first to propose the state machine approach, in his seminal 1984 paper on "Using Time Instead of Timeout In Distributed Systems" . Fred Schneider later elaborated the approach in his paper "Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial" .
'Ken Birman' developed the virtual synchrony model in a series of papers published between 1985 and 1987. The primary reference to this work is "Exploiting Virtual Synchrony in Distributed Systems" , which describes the Isis Toolkit, a system that was used to build the New York and Swiss Stock Exchanges, French Air Traffic Control System, US Navy AEGIS Warship, and other applications.
Recent work by Miguel Castro and Barbara Liskov used the state machine approach in what they call a "Practical Byzantine fault tolerance" architecture that replicates especially sensitive services using a version of Lamport's original state machine approach, but with optimizations that substantially improve performance.
A hard real-time variant of this approach has been developed by Prof. Hermann Kopetz at TU Vienna, Austria, in the "Time-Triggered Architecture" (TTA) based on the Time-Triggered Protocol (TTP) during the 1990's. It has been commercialized in the 2000's by TTTech Computertechnik AG and deployed in various aerospace projects.
Further Information
Get more info on 'State Machine Replication'.
|
External Link Exchanges
Do you know how hard it is to get a link from a large encyclopaedia? Well we're different and will prove it. To get a link from us just add the following HTML to your site on a relevant page:
<a href="http://state_machine_replication.totallyexplained.com">State machine replication Totally Explained</a>
Then simply click through this link from your web page. Our crawlers will verify your link, extract the title of your web page and instantly add a link back to it. If you like you can remove the words Totally Explained and embed the link in article text.
As long as your link remains in place, we'll keep our link to you right here. Please play fair - our crawlers are watching. Your site must be closely related to this one's topic. Any kind of spamming, dubious practises or removing the link will result in your link from us being dropped and, potentially, your whole site being banned. |
|
|